App review analysis¶

This is an analysis of a Swedish 'buy now, pay later' company's app on the Google Play Store¶

To scrape reviews and scores, the google_play_scraper was employed (https://pypi.org/project/google-play-scraper/)¶

And this is my attempt to visualise what people are saying about the app¶

The over-arching plan for this analysis is to understand:

  1. Can the reviews be quantified?
  2. Are we able to gauge opinions?
  3. Can we visualise items or topics of interest in reviews?
    • Specifically, can we visualise it in a way that is engaging, fun and easy to understand

Basic cleaning, merging and concatenating¶

In [2]:
import pandas as pd
import numpy as np
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from collections import Counter
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import seaborn as sns
from langdetect import detect, LangDetectException
import string
In [3]:
df_gb = pd.read_csv('app-reviews_en-gb.csv', sep='|')
df_us = pd.read_csv('app-reviews_en-us.csv', sep='|')
In [73]:
# Check for duplicates within each df
duplicates_in_df_gb = df_gb[df_gb.duplicated(subset=['reviewId', 'userName', 'appVersion'], keep=False)]
duplicates_in_df_us = df_us[df_us.duplicated(subset=['reviewId', 'userName', 'appVersion'], keep=False)]

# Check for duplicates between the dfs
merged_df = pd.merge(df_gb, df_us, on='reviewId', how='inner')

# Check for duplicate 'reviewId' values in the merged df
duplicates_between_dfs = merged_df[merged_df.duplicated(subset='reviewId', keep=False)]

print("Duplicates within df_gb:")
print(duplicates_in_df_gb)
print("-"*70)
print("Duplicates within df_us:")
print(duplicates_in_df_us)
print("-"*70)
print("Duplicates between the DataFrames:")
print(duplicates_between_dfs)
Duplicates within df_gb:
Empty DataFrame
Columns: [reviewId, userName, userImage, content, score, thumbsUpCount, reviewCreatedVersion, at, replyContent, repliedAt, appVersion]
Index: []
----------------------------------------------------------------------
Duplicates within df_us:
Empty DataFrame
Columns: [reviewId, userName, userImage, content, score, thumbsUpCount, reviewCreatedVersion, at, replyContent, repliedAt, appVersion]
Index: []
----------------------------------------------------------------------
Duplicates between the DataFrames:
Empty DataFrame
Columns: [reviewId, userName_x, userImage_x, content_x, score_x, thumbsUpCount_x, reviewCreatedVersion_x, at_x, replyContent_x, repliedAt_x, appVersion_x, userName_y, userImage_y, content_y, score_y, thumbsUpCount_y, reviewCreatedVersion_y, at_y, replyContent_y, repliedAt_y, appVersion_y]
Index: []

[0 rows x 21 columns]
In [4]:
result_df = pd.concat([df_gb, df_us], ignore_index=True)

duplicates_in_result = result_df[result_df.duplicated(subset=['reviewId', 'userName', 'appVersion'], keep=False)]
print("Duplicates within result:", len(duplicates_in_result))
Duplicates within result: 59368

Quite a few duplicates¶

  • It's like the script used to scrape reviews from the google play store wasn't perfect
  • Or there is some bleeding between the UK and US google play stores

Never Mind though, as this is a test and we have more than 30,000 records after removing duplicates 👇¶

In [5]:
result_df_no_duplicates = result_df.drop_duplicates(subset=['reviewId', 'userName', 'appVersion'], keep='first')

print("Shape of df after removing duplicates:", result_df_no_duplicates.shape)
Shape of df after removing duplicates: (30316, 11)

Initialising the main df for use throughout¶

In [6]:
df = result_df_no_duplicates

Sentiment analysis¶

How does the rating of an app affect people's sentiment when writing a review?¶

Using TextBlob we can look at basic natural language while also:¶
  • Calculating a sentiment score
    • -1.0 to 1.0 (Negative - Positive) with 0.0 being Neutral
  • We can group sentiment ratings into the score an see how people talk about an app based on the review
In [7]:
from textblob import TextBlob

sdf = df.copy()

# Function to calculate sentiment polarity
def calculate_sentiment(text):
    return TextBlob(text).sentiment.polarity

# Calculate sentiment polarity for each review
sdf['sentiment'] = sdf['content'].dropna().apply(calculate_sentiment)

# Group by rating and calculate average sentiment
avg_sentiment_by_rating = sdf.groupby('score')['sentiment'].mean().reset_index()

avg_sentiment_by_rating
Out[7]:
score sentiment
0 1 -0.043658
1 2 0.028367
2 3 0.082904
3 4 0.305454
4 5 0.485364
In [16]:
plt.figure(figsize=(10, 5))
plt.bar(avg_sentiment_by_rating['score'], avg_sentiment_by_rating['sentiment'], color='pink')
plt.xlabel('Score')
plt.ylabel('Average Sentiment')
plt.title('Average Sentiment by Score')
plt.xticks(avg_sentiment_by_rating['score'])
plt.grid(axis='y')

plt.show()
No description has been provided for this image

Some findings¶

  • Reviewers on average do not take a completely negative tone when giving the app a bad score (1)
  • Every score from 2 and above is positive in some way shape or form
  • A score of 5 (sentiment: 0.49) are moderately positive on average

There seems to be a clear correlation between review sentiment and app score¶

What's the spread of scoring like?¶

Polar scoring¶

  • People either love it or hate it
  • Despite the high number of score == 1, it doesn't necessarily mean people a talking negatively about the app
    • It likely misses a feature the reviewer wants, or minor bugs make the experience less than favourable
  • It may also speak to human pyschology, in that reviewers would pehaps prefer a binary system (like/dislike)
In [8]:
plt.figure(figsize=(10, 6))
ax = sns.countplot(data=sdf, x='score')

# Count labels bars
for p in ax.patches:
    ax.annotate(f'{int(p.get_height())}', (p.get_x() + p.get_width() / 2., p.get_height()),
                ha='center', va='baseline', fontsize=8, color='black', xytext=(0, 5),
                textcoords='offset points')

plt.xlabel('Rating')
plt.ylabel('Count')
plt.title('App Ratings')
plt.xticks(range(0, 5), labels=[1, 2, 3, 4, 5])

plt.show()
No description has been provided for this image

Does sentiment and score correlate generally over time?¶

In [71]:
non_nan_count = df['content'].notna().sum()
print("Number of non-NaN rows in 'content' column:", non_nan_count)
Number of non-NaN rows in 'content' column: 30316
In [77]:
tdf = df.copy()

tdf['at'] = pd.to_datetime(tdf['at'], format='%Y-%m-%d %H:%M:%S')
tdf['content'].fillna("", inplace=True)
tdf['sentiment'] = tdf['content'].apply(lambda x: TextBlob(str(x)).sentiment.polarity)
tdf[['rating', 'sentiment']] = tdf[['score', 'sentiment']].apply(pd.to_numeric, errors='coerce')

tdf.set_index('at', inplace=True)

# Resample by day and calculate mean for 'rating' and 'sentiment'
tdf_resampled = tdf[['rating', 'sentiment']].resample('D').mean().reset_index()

# Calculate EMAs for both 'rating' and 'sentiment'
ema_span = 30.417  # Approximately one month
tdf_resampled['ema_rating'] = tdf_resampled['rating'].ewm(span=ema_span).mean()
tdf_resampled['ema_sentiment'] = tdf_resampled['sentiment'].ewm(span=ema_span).mean()

plt.figure(figsize=(15, 7))

# Plot EMA of rating
ax1 = plt.gca()  # Get current axis
ax2 = ax1.twinx()  # Create another axis that shares the same x-axis

ax1.plot(tdf_resampled['at'], tdf_resampled['ema_rating'], label='EMA Rating', color='blue')
ax1.set_xlabel('Review Date')
ax1.set_ylabel('Average Rating', color='blue')
ax1.tick_params(axis='y', labelcolor='blue')

# Plot EMA of sentiment on the secondary y-axis
ax2.plot(tdf_resampled['at'], tdf_resampled['ema_sentiment'], label='EMA Sentiment', color='green')
ax2.set_ylabel('Average Sentiment', color='green')
ax2.tick_params(axis='y', labelcolor='green')

plt.title(f'Average Rating and Sentiment Over Time with EMA (Span = {ema_span} days)')
plt.grid(True)
plt.tight_layout()

# Add legends
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')

plt.show()
No description has been provided for this image

Sentiment and ratings/score are very closely correlated over time¶

  • There is a steep decrease in the number of reviews before 2019
    • This affects the quality of the graph and insights prior
    • Even in the noisy period prior to 2019, there is stil incredibly good alignment between rating and sentiment

This shows that rating/scores can be used as an indicator of sentiment¶

Some thoughts¶

Scores/Ratings are inherently quantitative, making them straightforward to aggregate, compare, and analyze statistically. They offer a clear, albeit simplistic, measure of user sentiment.

  • Scores provide a direct measure of opinion, but they lack context and the nuances of why a user might have given a particular rating. Two users might give the same score for entirely different reasons.

Sentiments are inherently qualitative. Analysis involves evaluating the text of a review to determine the reviewer's subjective feelings or attitudes.

  • It can help reveal the reasons behind a user's score or even offer insights into aspects of the product or service that weren't explicitly rated.
  • Sentiment analysis can be more subjective and depends heavily on the quality of the analysis tools. Ambiguity and varying expressions of sentiment across different cultures and languages can make accurate sentiment analysis challenging. Comparing Scores and Sentiment Complementary Insights: Scores provide a quick, at-a-glance view of user opinions that is easy to quantify. In contrast, sentiment analysis delves deeper into the "why" behind the scores, offering richer, qualitative insights.

The above analysis shows that Scores/Ratings can be used as a pseudo-indicator of Sentiment for this particular app during the period interrogated.

In [81]:
# Check the number of reviews per year
tdf.reset_index(inplace=True)
year_count = tdf['at'].dt.year.value_counts().sort_index()

year_count
Out[81]:
at
2018      55
2019    3584
2020    5267
2021    9044
2022    8354
2023    4012
Name: count, dtype: int64

Thumbs up¶

  • People can give reviews a 'thumbs up'
  • We'll make the assumption that a 'thumbs up' 👍 is an agreement to the review
In [83]:
# Calculate the average thumbsUpCount for each rating
average_thumbs_up = df.groupby('score')['thumbsUpCount'].mean()

# Plot the average thumbsUpCount for each rating
plt.figure(figsize=(10, 6))
plt.bar(average_thumbs_up.index, average_thumbs_up.values, color='blue')
plt.xlabel('Rating')
plt.ylabel('Average Thumbs Up Count')
plt.title('Average Thumbs Up Count per Rating')
plt.xticks(average_thumbs_up.index)
plt.show()
No description has been provided for this image

People tend to agree with a review when it is less than positive¶

We'll call a score of 2 the 'elbow point'¶

This result above suggests a few critical insights into user behavior towards the app:

Validation of Negative Experiences: Users tend to agree more frequently with lower-score reviews.

  • This could indicate a shared sentiment of dissatisfaction among a substantial portion of the app's user base.

Critical Engagement: Users might be more inclined to engage with negative reviews as they seek validation for their own experiences.

Opportunities for Improvement: From a developer or app owner's perspective, the prominence of thumbs-up on lower-score reviews serves as a critical feedback loop.

  • It can highlight areas needing urgent attention and improvement. Addressing these commonly agreed-upon issues could significantly enhance user satisfaction and overall app perception.

Community and Empathy: The act of agreeing with reviews, especially negative ones, underscores a community aspect where users feel a sense of solidarity.

What topics matter most to reviewers of the app?¶

To do this, we employ natural language processing and Latent Dirichlet Allocation (LDA) to observe/ explain why certain parts of the data are similar¶

In [9]:
import gensim
from gensim import corpora
from gensim.models import LdaModel
import pyLDAvis.gensim_models

ldf = df.copy()

# download stop words for multiple languages as a safety
nltk.download('punkt')
stop_words = set(nltk.corpus.stopwords.words(['english', 'german', 'spanish', 'swedish']))

# additional words to exclude - these words muddle topics
exclude_words = {'ca', 'app', 'ap'}

# tokenize and clean text
tokenized_data = []
for review in ldf['content']:
    if not isinstance(review, str):
        continue
    try:
        if detect(review) == 'en': # detect the language of the review
            tokens = nltk.word_tokenize(review.lower())
            tokens = [word for word in tokens if word.isalpha() and word not in stop_words and word not in exclude_words]
            tokenized_data.append(tokens)
    except LangDetectException:
        continue  # skip the review if language can't be detected

# create a dictionary and corpus
dictionary = corpora.Dictionary(tokenized_data)
corpus = [dictionary.doc2bow(text) for text in tokenized_data]

# generate the LDA model with num_topics calculated in def compute_coherence_values
lda_model = LdaModel(corpus, num_topics=7, id2word=dictionary, passes=15, random_state=1)

lda_display = pyLDAvis.gensim_models.prepare(lda_model, corpus, dictionary, sort_topics=False)
pyLDAvis.display(lda_display)
Out[9]:

Looking at our principal components, we can clearly see 3-ish groups:¶

  • Group 1¶

    • Topics 1, 4, 6
  • Group 2¶

    • Topics 5, 7
  • Group 3¶

    • Topic 2, 3

There is a split across both the PC1 axis and PC2 axis¶

Topic 3 is the most abundant topic within the corpus 👇¶

In [10]:
# Initialize topic proportion counters
topic_counter = np.zeros(7)  # num_topics

# Go through the corpus and get the topic distribution for each document
for doc in corpus:
    topic_distribution = lda_model.get_document_topics(doc)
    for topic, proportion in topic_distribution:
        topic_counter[topic] += proportion

# Normalize the counts to get proportions
topic_proportions = topic_counter / topic_counter.sum()

plt.figure(figsize=(12, 6))
plt.bar(range(1, len(topic_proportions) + 1), topic_proportions)
plt.xlabel('Topic Number')
plt.ylabel('Proportion')
plt.title('Topic Proportions')
plt.show()
No description has been provided for this image
In [11]:
from gensim.models.coherencemodel import CoherenceModel

# Compute Coherence Score
coherence_model_lda = CoherenceModel(model=lda_model, texts=tokenized_data, dictionary=dictionary, coherence='c_v')
coherence_lda = coherence_model_lda.get_coherence()

print(f'Coherence Score: {coherence_lda}')
Coherence Score: 0.6141385120585212

The below function allows us to determine the best number of topics¶

  • By calculating a conherence score, we can tweak the model to get the best results
  • It's an incredibly slow process, do not run it more than once 😅
In [27]:
def compute_coherence_values(dictionary, corpus, texts, limit, start=2, step=3):
    coherence_values = []
    model_list = []
    for num_topics in range(start, limit, step):
        model = LdaModel(corpus, num_topics=num_topics, id2word=dictionary, passes=15)
        model_list.append(model)
        coherencemodel = CoherenceModel(model=model, texts=texts, dictionary=dictionary, coherence='c_v')
        coherence_values.append(coherencemodel.get_coherence())
    return model_list, coherence_values

# Function call
model_list, coherence_values = compute_coherence_values(dictionary=dictionary, corpus=corpus, texts=tokenized_data, start=2, limit=40, step=4)

# Plotting
limit=40; start=2; step=4;
x = range(start, limit, step)
plt.plot(x, coherence_values)
plt.xlabel("Num Topics")
plt.ylabel("Coherence score")
plt.legend(("coherence_values"), loc='best')

plt.minorticks_on()
plt.grid(which='minor', linestyle=':', linewidth='0.5', color='gray')

plt.show()
No description has been provided for this image

How coherent is this data?¶

Does the language make sense, and can meaning be identified?¶

  • Yes, with a score of 0.614 the coherence is useable
  • A coherence score of 0.614 with 7 topics represents a solid foundation for extracting meaning from the data in a useable fashion

Generate key words for each topic¶

In [13]:
# Show top N keywords for each topic
top_topics = lda_model.show_topics(num_topics=7, num_words=20, formatted=False)

for i, topic in enumerate(top_topics):
    print(f"Topic {i+1}:")
    print(", ".join([word[0] for word in topic[1]]))
Topic 1:
card, time, get, let, use, tried, phone, even, account, ghost, email, try, number, one, waste, work, trying, cant, says, log
Topic 2:
pay, love, buy, great, get, way, need, payments, things, want, like, thank, much, later, recommend, able, best, time, helps, thanks
Topic 3:
love, great, easy, use, good, best, shopping, service, awesome, convenient, far, really, payment, shop, helpful, experience, payments, way, absolutely, brilliant
Topic 4:
credit, time, used, use, purchase, never, using, paid, good, power, better, even, purchases, afterpay, limit, approved, payment, always, made, make
Topic 5:
service, customer, company, help, support, bank, use, know, ever, people, never, worst, take, scam, issue, horrible, account, contact, bad, give
Topic 6:
payment, order, pay, money, purchase, card, make, account, payments, first, get, still, back, due, got, one, item, bank, days, never
Topic 7:
work, slow, update, works, open, keeps, working, load, like, needs, takes, fix, screen, please, even, see, trying, page, website, get

Pass the topics to a LLM to identify themes¶

Topics and Their Themes¶

Topic 1: User Interface and Access Issues

  • Keywords: card, time, get, let, use, tried, phone, even, account, ghost, email, try, number, one, waste, work, trying, cant, says, log
  • Theme: This topic appears to focus on issues related to using the app, especially problems with logging in, account access, and functionality on mobile devices. "Ghost" might refer to temporary or virtual cards not working as expected, suggesting frustrations with financial transactions or account management.

Topic 2: Positive Feedback on Functionality and Convenience

  • Keywords: pay, love, buy, great, get, way, need, payments, things, want, like, thank, much, later, recommend, able, best, time, helps, thanks
  • Theme: Reviews in this topic seem to express appreciation for the app's payment features, especially for making purchases and managing payments over time. The repeated expressions of gratitude and recommendations indicate high user satisfaction with the app's convenience and ease of use.

Topic 3: Excellence in Shopping Experience

  • Keywords: love, great, easy, use, good, best, shopping, service, awesome, convenient, far, really, payment, shop, helpful, experience, payments, way, absolutely, brilliant
  • Theme: This topic highlights the app's excellence in providing a seamless shopping experience, emphasizing the ease of use, convenience, and customer service. Words like "awesome," "brilliant," and "best" suggest a very positive user perception, particularly regarding the shopping and payment functionalities.

Topic 4: Financial Features and Credit Management

  • Keywords: credit, time, used, use, purchase, never, using, paid, good, power, better, even, purchases, afterpay, limit, approved, payment, always, made, make
  • Theme: Focused on the financial aspects of the app, including credit use, purchase management, and comparisons with similar services like Afterpay. Issues with credit limits and approval processes are also touched upon, along with the reliability of making and managing payments.

Topic 5: Customer Service and Support Concerns

  • Keywords: service, customer, company, help, support, bank, use, know, ever, people, never, worst, take, scam, issue, horrible, account, contact, bad, give
  • Theme: Central to this topic are criticisms of customer service and support, with strong negative sentiment expressed through words like "worst," "scam," "horrible," and "bad." Users express frustration with how the company handles support issues, account problems, and interactions with the bank.

Topic 6: Transaction and Payment Issues

  • Keywords: payment, order, pay, money, purchase, card, make, account, payments, first, get, still, back, due, got, one, item, bank, days, never
  • Theme: This topic deals with specific grievances related to transactions, payments, and financial dealings through the app. There are mentions of delays, problems with orders and refunds, and difficulties in making or receiving payments.

Topic 7: Technical Performance and Usability

  • Keywords: work, slow, update, works, open, keeps, working, load, like, needs, takes, fix, screen, please, even, see, trying, page, website, get
  • Theme: Concerns about the app's technical performance, including issues with updates, slowness, loading times, and general usability problems. Users are asking for improvements and fixes to enhance the app's functionality and user experience.

Wordcloud per score/rating¶

In [20]:
import pandas as pd
from wordcloud import WordCloud
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
from collections import Counter
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.util import ngrams
import nltk
import random 

# Set a random seed for reproducibility
random.seed(42)

nltk.download('punkt')
nltk.download('stopwords')

ddf = df.copy()

custom_exclude = ['app', 'ap', 'ca', 'appen']

colors = ["#0000FF", "#FF69B4"]
cmap = LinearSegmentedColormap.from_list("custom", colors, N=256)

# Function to generate word cloud
def generate_word_cloud(text):
    wordcloud = WordCloud(width=800, height=400, background_color='white', colormap=cmap).generate_from_frequencies(text)
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()

stop_words = set(stopwords.words('english'))
#stop_words.update(stopwords.words('swedish'))
#stop_words.update(stopwords.words('german'))
#stop_words.update(stopwords.words('norwegian'))
#stop_words.update(stopwords.words('spanish'))

# Group the DataFrame by 'rating' and combine the 'review_description' text for each group
grouped_reviews = ddf.groupby('score')['content'].apply(lambda x: ' '.join(x.dropna().astype(str))).reset_index()

# Generate word clouds for each rating group
for _, row in grouped_reviews.iterrows():
    print(f"Word Cloud for Rating {row['score']}")
    tokens = [word for word in word_tokenize(row['content'].lower()) if word not in stop_words and word not in custom_exclude and word.isalpha()]
    word_freq = Counter(tokens)
    bigrams = list(ngrams(tokens, 2))
    bigram_freq = Counter(map(lambda x: ' '.join(x), bigrams))
    
    merged_freq = word_freq + bigram_freq
    
    generate_word_cloud(merged_freq)
Word Cloud for Rating 1
No description has been provided for this image
Word Cloud for Rating 2
No description has been provided for this image
Word Cloud for Rating 3
No description has been provided for this image
Word Cloud for Rating 4
No description has been provided for this image
Word Cloud for Rating 5
No description has been provided for this image

Wordcloud per topic¶

In [19]:
colors = ["#0000FF", "#FF69B4"]
cmap = LinearSegmentedColormap.from_list("custom", colors, N=256)

# Function to generate word cloud from word frequencies
def generate_word_cloud_topic(word_freqs):
    wordcloud = WordCloud(width=800, height=400, background_color='white', colormap=cmap).generate_from_frequencies(word_freqs)
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis("off")
    plt.show()

top_topics = lda_model.show_topics(num_topics=7, num_words=200, formatted=False)

for topic_num, words in top_topics:
    print(f"Word Cloud for Topic {topic_num+1}")
    
    word_freqs = {word: prob for word, prob in words}
    
    generate_word_cloud_topic(word_freqs)
Word Cloud for Topic 1
No description has been provided for this image
Word Cloud for Topic 2
No description has been provided for this image
Word Cloud for Topic 3
No description has been provided for this image
Word Cloud for Topic 4
No description has been provided for this image
Word Cloud for Topic 5
No description has been provided for this image
Word Cloud for Topic 6
No description has been provided for this image
Word Cloud for Topic 7
No description has been provided for this image

Final Thoughts¶

  • The goals of this analysis were:
  1. Can the reviews be quantified?

    • Reviews can be quantified and some basic take aways can be extracted. First, reviewers of this app tend towards a score of either 1 or 5.
    • It would be interesting to see if this is true of other app's reviews. But the 1 to 5 review scale is used a lot more similarly to a binary (like/dislike) rating scale.
    • Despite the scale used, 54% of reviews in the sample were scored 5.
    • There seems to be a tend toward people 'liking' or giving a thumbs-up to reviews with a lower score. The assumption being that it validates a user's negative experience quickly and efficiently.
  2. Are we able to gauge opinions?

    • Yes, and in multiple ways.
    • We can use Latent Dirichlet Allocation (LDA) to find thematic structure amongst reviews.
      • LDA is a generative probabilistic model which requires fine tuning and thoughtful interpretation of results as topics are not labelled.
    • Topics could then be passed to a LLM (Chat GPT) to create themes based on the input.
    • Many of the topics revolved around expected themes when deal with a 'buy now, pay later' app.
      • This further helped confirm that the topics generated were meaningful.
    • Topics 2 and 3 comprised about 40% of the total proportion of topics.
      • Topics 2 and 3 revolved around 'Positive Feedback on Functionality and Convenience' and 'Excellence in Shopping Experience'.
      • Both of which are positive topics and correlate nicely with our score 5 ratings.
    • Topic 1, 4 and 6 related to topics suchs as UX and payments.
      • Almost always negative themes mentioning issues, delays and reliability.
    • Topic 5 and 7 are also UX related, but relating to experience both within and outside of the app.
      • Also negative topics in regards to customer service, banking issues and general 'buggy' app issues.
  3. Can we visualise items or topics of interest in reviews?

    • The idea of visualising text and themes is nightmarish, but wordclouds are a fun and interesting way to do so.
    • Visualising the words used and using the size of each word as a scale for its use frequency really helps to drive home the message of these types of anaylses.
    • Word clouds could be generated for various groupings (e.g. scores or topics).
    • I believe this gives those without a background in data analysis a solid understanding of reviewer/user thoughts without the need for bar or line graphs.
In [ ]: